Exploiting Value Prediction for Fault Tolerance
نویسندگان
چکیده
Technology scaling has led to growing concerns about reliability in microprocessors. Currently, fault tolerance techniques rely on explicit redundant execution for fault detection or recovery which incurs significant performance, power, or hardware overhead. This paper makes the observation that value predictability is a low-cost (albeit imperfect) form of program redundancy that can be exploited for fault tolerance. We propose to use the output of a value predictor to check the correctness of predicted instructions, and to treat any mismatch as an indicator that a fault has potentially occurred. On a mismatch, we trigger recovery using the same hardware mechanisms provided for mispeculation recovery. To reduce false positives that occur due to value mispredictions, we limit the number of instructions that are checked in two ways. First, we characterize fault vulnerability at the instruction level, and only apply value prediction to instructions that are highly susceptible to faults. Second, we use confidence estimation to quantify the predictability of instruction results, and apply value prediction accordingly. In particular, results from instructions with higher fault vulnerability are predicted even if they exhibit lower confidence, while results from instructions with lower fault vulnerability are predicted only if they exhibit higher confidence. Our experimental results show such selective prediction significantly improves reliability without incurring large performance degradation.
منابع مشابه
Exploiting Inherent Program Redundancy for Fault Tolerance
Title of dissertation: Exploiting Inherent Program Redundancy for Fault Tolerance Xuanhua Li, Doctor of Philosophy, 2009 Dissertation directed by: Professor Donald Yeung Department of Electrical and Computer Engineering Technology scaling has led to growing concerns about reliability in microprocessors. Currently, fault tolerance studies rely on creating explicitly redundant execution for fault...
متن کاملImportant Issues in Software Fault Prediction : A Road Map
Quality assurance tasks such as testing, verification and validation, fault tolerance, and fault prediction play a major role in software engineering activities. Fault prediction approaches are used when a software company needs to deliver a finished product while it has limited time and budget for testing it. In such cases, identifying and testing parts of the system that are more defect prone...
متن کاملExploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators
Neural network accelerators are an increasingly utilized component of heterogeneous multicore architectures. This new utilization stems from their capability to improve the power– performance of machine learning algorithms and emerging techniques like core state prediction and code approximation. We explore the exploitation of the inherently redundant structure of neural networks for fault-tole...
متن کاملExploiting Unused Storage Resources to Enhance Systems ’ Energy Efficiency , Performance , and Fault - Tolerance by Hai
Exploiting Unused Storage Resources to Enhance Systems’ Energy Efficiency, Performance, and Fault-Tolerance
متن کاملExploiting Omissive Faults in Synchronous Approximate Agreement
ÐIn a fault-tolerant distributed system, it is often necessary for nonfaulty processes to agree on the value of a shared data item. The criterion of Approximate Agreement does not require processes to achieve exact agreement on a value; rather, they need only agree to within a predefined numerical tolerance. Approximate Agreement can be achieved through convergent voting algorithms. Previous re...
متن کامل